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Description 

TECHNICAL FIELD 

[0001] This application describes a reliable array of 
distributed computing nodes forming a network which 
includes redundant communication and storage of infor- 
mation in a way to form robust communications and dis- 
tributed read and write operations. The system may also 
use detection of a condition which indicates the need for 
redundancy, and reconfiguration in response to the con- 
dition in order to compensate for the condition. 

BACKGROUND ART 

[0002] Computing and storage over a distributed en- 
vironment has a great potential of leveraging existing 
hardware and software. Such a system would find use 
as a distributed and highly available storage server. 
Possible applications include use as multimedia serv- 
ers, web servers, and database servers. More generally, 
however, a system of this type can be used for any ap- 
plication where information needs to be distributed 
among locations. 

[0003] The challenge, however, is the proper mix of 
connections, monitoring and operation which allows re- 
liability without excessively increasing the cost. 
[0004] It is known how to provide redundant storage 
systems which can compensate for certain faults. One 
example of such a system is the so-called reliable array 
of independent disks or "RAID". Two examples of the 
RAID type system are found in U.S. Patent Numbers 
5,579,475, and 5,412,661. These systems provide re- 
dundant data storage, so that failure of any disk of the 
system will be compensated by redundant data else- 
where in the system. 

[0005] Communication systems are known in which 
each computer in the system ("node") is connected with 
the other nodes. One example is Ethernet, which is a 
bus-based protocol. The computing nodes communi- 
cate via the bus. A server typically stores all of the 
shared data for all the nodes. The nodes may also have 
local data storage. 

[0006] A single network system includes a single Eth- 
ernet link between the nodes and the server. Therefore, 
if any fault occurs in the connection or in the communi- 
cation to the server, or in the server itself, the nodes may 
no longer be able to obtain conventional data access 
services from the server. The nodes are then forced to 
operate in stand alone mode. Those nodes can then on- 
ly operate using data which is available locally. 
[0007] Server based systems which attempt to in- 
crease the reliability of such a system are known. One 
such system uses a dual bus connection. Each comput- 
ing node is provided with two Ethernet connections, us- 
ing two separate Ethernet cards, to two separate buses 
to two separate servers. This is effectively two separate 
systems, each having its full complement of hardware 



and storage. 

[0008] If either connection or bus has an error, normal 
operation can still continue over the other bus. A system 
with two redundant buses and two redundant servers is 
5 called dual bus, dual server. Such a dual bus, dual serv- 
er system will tolerate any single network fault. Howev- 
er, such systems usually require that all information be 
duplicated on each server. 

[0009] Efforts to provide system redundancy include 
10 the system described in the published document EP-A- 
036693 5 entitiled High Speed Switching System with 
Flexible Protocol Capability. This document describes a 
high speed switching system that includes switching 
planes and that continues running, even if a switching 
*5 plane or link is faulty, by using the remaining switching 
planes. The switching planes are of distinct types and 
for distinct purposes. Moreover, this document does not 
concern system reconfiguration and data reconstruc- 
tion. 

20 

DISCLOSURE OF INVENTION 

[001 0] The system described in this application lever- 
ages existing hardware and software by using relatively 
25 low power workstations, such as personal computers. 
These persona] computers are connected by a redun- 
dant connection. The connection can use existing hard- 
ware, e.g. local agd/or wide area networks. 
[001 1 ] The present application describes a redundant 
30 distributed network system, and server formed from an 
array of distributed computing nodes in accordance with 
the claims, which follow. Each of the computing nodes 
stores information in a special redundant way, and also 
runs a protocol ensuring robust communication. 
35 [0012] The system includes a special architecture and 
operation which allows fault tolerance in the network, 
preferably such that some specific number of network 
faults will not affect the operation of the remaining nodes 
of the system. However, no single one of the nodes 
^0 should duplicate the storage of all of the information. 
[0013] The server system includes redundant com- 
munication and storage. The redundant storage is ob- 
tained from redundant storage of the information using 
a special redundant coding scheme. 
45 [0014] The server system also runs a distributed de- 
tection routine which detects system functional states. 
One system functional state, for example is a network 
fault. The network fault can include a communication 
fault such as a broken link, or an inoperable node or 
50 switching device. More generally, however, the system 
functional state can be any condition which may prevent 
any operation of the network. The system functional 
state can be compensated by the system redundancy. 
[0015] The server system preferably runs a network 
55 monitor process which detects the system functional 
state. A logical network process reconfigures the sys- 
tem, to make use of the redundancy to compensate for 
the system functional state. 
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[0016] The system also uses a distributed read and 
write system which allows alternative operation in the 
presence of a system fault. This alternative operation 
uses the system redundancy. 

BRIEF DESCRIPTION OF DRAWING 

[0017] The objects, advantages and features of this 
invention will be more readily appreciated from the fol- 
lowing detailed description, when read in conjunction 
with the accompanying drawing, in which: 

Figure 1 shows a basic block diagram of the sim- 
plest networking example; 

Figure 2 shows a more complicated example with 
more switches and more computing nodes; 
Figure 3 shows an even further reliable networking 
example; 

Figure 4 shows a fault-tolerant system; 

Figure 5 shows an example of how this system 

would be used to store video; 

Figure 6 shows how such a system could tolerate 

link faults; 

Figure 7 shows a block diagram of a software archi- 
tecture for reliable communications; 
Figure 8 shows a basic software flowchart of the 
network monitor process; 

Figure 9 shows a connectivity protocol state ma- 
chine for the network monitor process; 
Figure 10A shows formation of the data structure 
for connectivity; 

Figure 1 0B shows a flowchart of the link status op- 
eration; 

Figure 11 shows a flowchart of the RUDP process; 
Figure 12 shows a possible arrangement of com- 
puting nodes and switching elements; 
Figure 13 shows a more advanced arrangement of 
computing nodes and switching elements; 
Figures 1 4A through 1 4E show calculation of parity 
rows in X code for an array code of 5 by 5; and 
Figure 15 shows the basic layout of the X code sys- 
tem. 

BEST MODE FOR CARRYING OUT THE INVENTION 

[001 8] Figure 1 shows a first, most basic embodiment 
of a reliable redundant distributed network server sys- 
tem. The system is formed of the computing nodes 
{"nodes") and the network which carries out switching 
between the nodes. 

[0019] The network of Figure 1 includes both commu- 
nication and storage redundancy among the nodes and 
the network. This redundancy can be used to compen- 
sate for a defined number of system functional states. 
The system functional states which are compensated 
by the redundancy can include faults in the network 
("communication faults"), faults in memory storage 
where the memory could be disks, volatile memory, or 



any other kind of memory which stores data ("memory 
faults"), or any other kind of fault which produces an un- 
desired result. 

[0020] The distributed server system also includes a 
5 detection process. The detection process operates in 
each node to view the connection to other nodes in the 
network. Each node views the network according to the 
same protocol, using a pool of hints about the condition 
of the network. This detection process guarantees that 
io both sides see the same history of the network. Even 
. though the detection process is distributed, it maintains 
the network history of the nodes of the network consist- 
ent within a desired threshold, using a token passing 
system. The tokens limit the degrees of freedom of the 
'5 two sides, by allowing only a specified number of actions 
without an acknowledgment that the other side has tak- 
en an action. 

[0021 ] The detection process runs invisibly relative to 
the other programs and user applications. The preferred 
mode of the detection process uses a network monitor 
("NETM") process which operates to gather information 
about the system being monitored. That NETM process 
preferably determines whether the other node is prop- 
erly operating. However, more generally, the NETM 
process determines a parameter related to usability. 
That can include, as in the following, is the system up 
or down. It could also include an indication of how busy 
that system is, which indication could be used for load 
balancing. 

[0022] The system of figure 1 illustrates the features 
of the invention using four computing nodes ("nodes") 
1 00, 1 02, 1 04, and 1 06 connected by two switches 1 1 0 
and 112. Each node can communicate with each other 
node overtwo different and hence redundant paths. 'For 
example, node 1 00 can communicate with node 1 06 via 
interconnection 120 between node 100 and switch 110. 
A totally separate path exists which allows redundant 
interconnection over path 122 from switch 110 to node 
106. Node 100 can alternatively communicate to node 
106 using interconnection 124 from node 100 to switch 
112 and interconnection 126 from switch 112 to node 
106. Each node, therefore, is connected to each other 
node by at least two completely separate and redundant 
connection paths. 

[0023] This redundant communication capability al- 
lows selection of a different path in case it is preferable 
to avoid use of one communications link. For example, 
loss of switch 110 or any part of the line of 120 and/or 
1 22 will still allow communication over lines 1 24 and 1 26 
via switch 112. 

[0024] The information is also stored in a redundant 
manner which allows retrieval of any information, even 
if any part of the network fails or is otherwise unavaila- 
ble, e.g., due to high traffic. The redundant storage 
mechanism is illustrated in Figure 1 as element 1 40. The 
data in redundant storage 140 is preferably stored such 
that loss of any n-K nodes, where n is the total number 
of nodes in the system and K is selected number, will 
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not affect the ability to obtain any desired data from the 
system. This is preferably done by storing data accord- 
ing to a maximum distance separable ("MDS") coding 
system which includes stored redundancy information 
in each of the nodes. This redundancy information can 
be used with other node data to reconstruct the data for 
any missing node or nodes. 

[0025] If the detection process determines any kind of 
undesirable system functional state, such as an inoper- 
able node, or a broken communication link, 'a reconfig- 
uration process 150 is carried out. The reconfiguration 
process 1 50 is robust against faults by virtue of its ability 
to use at least one of the storage redundancy or the 
communication redundancy. Reconfiguration process 
allows the system to operate in the presence of a spec- 
ified fault. This might not, however, require any dedicat- 
ed switching. For example, a path between nodes 1 00 
and 106 can be established oyerpath 1 via 120/110/122, 
oroverpath2 via 124/112/126. Under normal operation, 
the communication would alternately occur over path 1 , 
then path 2, then path 1 , etc. However, if there is a fault 
or overload in path 1 . then all communications will occur 
over path 2. This is a reconfiguration in the sense that 
the communications are appropriately directed. Even 
though half of the communications would have been di- 
rected over path 2 anyway, the reconfiguration makes 
all of the communications occur over path 2. 
[0026] Figure 1 therefore illustrates the basic features 
of the distributed server as described by the present 
specification. These features include redundancy of 
communication, redundancy of storage, detection of an 
event which can be compensated by the redundancy, 
and reconfiguration to use the redundancy to compen- 
sate for the event. 

Redundant Communication 

[0027] The Figure 1 system shows a simple redun- 
dant connection with four nodes 100-106 and two 
switches 110 and 112. The nodes are preferably stan- 
dalone workstations, such as personal computers 
("PCS") each with two PCI bus-based communication 
cards. The communication cards communicate via the 
switches to similar communication cards in the other 
PCS. The protocol of the communication cards could be 
any commercially available type, such as Ethernet or 
others. The preferred system uses Myrinet switches for 
the switching nodes 200 as shown in Figure 2. Myrinet 
switches are available for sale commercially, and are al- 
so described in Boden et al. "Myrinet : a gigabit per sec- 
ond local area network" IEEE Micro 1995. 
[0028] The special node connection used by the 
present invention provides a communication redundan- 
cy which improves the ability to operate normally in the 
presence of network communication faults. These net- 
work communication faults include faulted communica- 
tion, including switch faults, broken links, or switch fail- 
ures. The connections are established in a way that min- 



imizes the possibility that any communication fault or 
combination of communication faults could cause a 
communication disruption or isolation of nodes. The im- 
portance of proper connection is illustrated with refer- 

5 ence to the following. 

[0029] Figure 2 shows a system that connects eight 
computing nodes 200 through 214 using four switches 
220 through 226. Every computing node includes two 
possible interconnect link paths. This provides redun- 

'0 dancy of communications. 

[0030] Communication failures in the system of Fig- 
ure 2, however have the possibility of undesirably "iso- 
lating" groups of computing nodes. These isolated 
groups of computing nodes are isolated in the sense that 

'5 they are no longer able to communicate with all of the 
other working nodes of the distributed server. 
[0031] As an example, if both switches 224 and 226 
were to fail, then the computing nodes 200 to 206 would 
be totally isolated from the computing nodes 208 

20 through 214. This causes an isolatable system which is 
usable, but less preferred. 

[0032] For example consider an example where the 
MDS code used requires six of eight nodes to recon- 
struct data, If the system were isolated as explained 
25 above, then only half of the nodes would have commu- 
nication. Since there would be four communicable 
nodes, this particular fault would prevent the data from 
being reconstructed. 

[0033] The connectivity structure of Figure 3 is pre- 
30 ferred. This ten node, four switch system has improved 
interconnection in the case of communications faults. 
The connection interface is made such that loss of any 
two switches can affect only two computing nodes in the 
worst case. See for example Figure 4 which illustrates 
35 the situation of switches 320 and 326 having failed. The 
bolded lines show the communication lines which are 
affected by this failure. Only the computing nodes 304 
and 312 are isolated by this two-switch failure. This 
leaves all other nodes being totally operational, and no 
40 isolation of nodes. 

[0034] An important part of the fault tolerance is ob- 
tained from the specific interconnection of the switches 
and nodes. As an example given above, the Figure 2 
system has a possible drawback that it becomes possi- 
45 ble to isolate two halves of the computing nodes. The 
isolated system includes computing nodes 200 through 
206 which are capable of communicating but are isolat- 
ed from the group of nodes 208 through 214. 
[0035] Another example of the problem is shown in 
50 Figure 12 which represents one possible way of inter- 
connecting a number of computing nodes using switch- 
ing nodes. Each switching node N is located between 
two adjacent computing nodes C. This is a usable, but 
less preferred configuration. Note that if computing 
55 nodes 1200 and 1202 ever become simultaneously 
faulted, the communication capability of the system will 
be split along the dotted lines shown in Figure 12. This 
will effectively isolate one-half of the system 1204 from 
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the other half of the system 1206. 
[0036] An object of the connection described in this 
specification is to avoid this kind of possible isolation 
formed by any two communications failures/ The pre- 
ferred system describes connecting the nodes in the 
most non-local way possible. This compares with the 
system of Figure 1 2 in which each switching node is con- 
nected to the two closest computing nodes. The inven- 
tors found that the unobvious system of connecting be- 
tween non-local switches produces the highest degree 
of fault tolerance. 

[0037] Figure 13 shows such a device. Each node is 
shown as connected to two switches. The diagram de- 
picts the connection as being between any two most dis- 
tant switches. When laying out the diagram of switches 
and nodes as shown in Figure 13. this diagrams the con- 
nections as diameters to connect between two of the 
switches that are physically most distant from one an- 
other. This connection has the advantage that cancella- 
tion of any three switches cannot have the effect of iso- 
lating two halves of the unit. On the contrary, breaking 
the unit in any two places still allows communication be- 
tween many of the nodes. Any three losses isolates only 
some constant number of nodes - those directly affect- 
ed -- regardless of total number of nodes in the system. 
[0038] Assume for example, a communication failure 
at the location 1310 and another break at the location 
1312. It is apparent that nodes can still communicate 
since switch 1300 is still connected to switch 1302 via 
switch 1304. Switch 1300 is also connected to switch 
1306 via switch 1308. In an analogous way, all of these 
switches are connected to one another even if there is 
such a break. Moreover, with this preferred system, the 
most node to node connection that could possibly be 
necessary is one quarter of a way around the system. 
[0039] The non-locality concept is also applicable to 
arrangements other than a ring. For example, any ar- 
rangement which could be visualized as a ring is alter- 
natively usable. 

[0040] The preferred server system shown in Figures 
1 through 3 uses personal computer-based worksta- 
tions connected via redundant networks using the Myri- 
net interconnect technology. Alternatively, of course, 
other communication technology, such as 100 MB Eth- 
ernet can be used. All of these systems have in common 
the capability of maintaining redundancy in the pres- 
ence of faulty links. The system could be used with any 
number of communications elements, although two is 
the preferred and disclosed number. 

Redundant Storage 

[0041] In the preferred embodiment of Figure 1 , each 
node stores only a portion of any given stored data. The 
stored data is retrieved using a part of each information 
that is actually stored in the local node, and a part from 
other nodes. An illustration of this concept is shown with 
reference to Figure 5. Figure 5 illustrates a video server. 



The distributed server provides data indicative of video, 
which is displayed as shown. Each computing node is 
shown storing half of the total data. The data is redun- 
dantly stored such that any video frame can be recon- 
5 structed from the data in the one node requesting the 
data, when it is combined with the data in any other 
node. 

[0042] This storage scheme allows any node to re- 
ceive its desired information so long as that node does 

io not become isolated from all other nodes. This scheme 
would provide storage redundancy for the case of many 
failures in the distributed server. 
[0043] More generally, however, the preferred 
scheme defined herein allows reconstructing data from 

*5 any subset of k working nodes out of the total of n nodes. 
The example given below includes k=2 and n=4. 
[0044] Figure 6 illustrates how the remaining comput- 
ing nodes can reconstitute any item of served-out video, 
in the case of a node failure. This can be accomplished 

20 by any coding scheme which allows redundant storage. 
[0045] The preferred system has the ability to lose any 
two communication links without losing any other com- 
munication function of the server system, and without 
effecting other nodes besides those which actually in- 

25 elude the faults. 

[0046] The redundant memory feature of the system 
stores encoded data of a smaller size than the total data 
half the data in each node. Therefore, for each file of 
size K in a system with k working nodes, in this preferred 

30 embodiment, K/k of that file is stored on each node of 
the server. The other (k-1) of the file is obtained from 
other k-1 working nodes. 

X - Code 

35 

[0047] Storage redundancy is obtained according to 
the preferred embodiment by distributing the storage of 
the information between nodes. As explained above, for 
each item of information of size K, the preferred system 

40 stores K/k data (the original size of the information)in 
each node, where k is the number of nodes that will be 
necessary to reconstruct the data. Each node can re- 
construct any of the items of information by accessing 
the other K/k of the information from any other node. 

45 The information is preferably stored using a maximum 
distance separable ("MDS") code to store the informa- 
tion. The preferred mode of storing the information uses 
a new coding system called X-Code. The X-Code as de- 
scribed herein is the special, but optimized, code for 

so storing each item of information spread among the 
nodes, and more specifically, the disks of the nodes, 
[0048] Most preferably, only a part of the information, 
some portion of the encoded data, is stored on each 
node. Each node also stores information indicating 

55 some property of information on other nodes. For exam- 
ple, that property could be a checksum or parity, indicat- 
ing a sum of data on the other nodes. That information 
is used along with the information on the other nodes in 
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order to reconstruct the information on those other 
nodes. 

[0049] As described above, the preferred code used 
is X-code, which is described in detail in the following. 
X-code is a Maximum Distance Separable ("MDS") ar- 
ray code of N by N where N is preferably a prime 
number. This code can be both encoded and decoded 
using only exclusive OR ("XOR") and cyclic shift oper- 
ations. This makes X-code much faster to encode and 
decode as compared with more computationally-inten- 
sive codes such as Reed-Solomon codes. 
[0050] The X-Code has a minimum column distance 
of 3. This means that the code can correct either one 
column error or two column erasures. X-code has a spe- 
cific property that the change of a single information unit, 
e.g., a single information bit or symbol in X-code, will 
always effect only two parity bits or symbol. Therefore, 
whenever any information is updated, only those two 
parity bits or symbols will need to be changed. 
[0051 ] The system of X-Code uses an array shown in 
Figure 15. Each column 1500 represents the informa- 
tion in a single node, maps to each node. The parity 
symbols are stored in rows rather than columns. 
[0052] The code is arranged using all the nodes of the 
network collectively to form an array of N x n where N 
is preferably = n. The array includes N-2 x N information 
symbols, and 2 x n parity symbols. Figure 14A shows 
an exemplary array with n=5. The portion of the nodes 
1400 represent the information, with each boxed ele- 
ment representing one unit of information, e.g. a bit, a 
sector or some other unit of a disk. These units will be 
generically referred to in this specification as symbols. 
[0053] The non-information part 1402 represents re- 
dundant information. As will be explained herein, for any 
disk, e.g. disk number 1404 represented by a single col- 
umn of the array, the redundant information 1402 rep- 
resents redundancy information from other disks - that 
is the redundant information is only from disks other than 
1404. 

[0054] The X-Code system forms a column represent- 
ing the contents of the entire disk 1 404. The parity sym- 
bols of the X-Code are formed of two extra rows 1 402 
on the disk. Each disk therefore has N-2 information 
symbols as well as two parity symbols. Any error or eras- 
ure of a symbol in a column can be recovered from col- 
umn erasures. 

[0055] Turning specifically to the encoding procedure, 
if we let C j} be the symbol of the ith row and jth column, 
then the parity symbols of X-Code are constructed ac- 
cording to equation 1 : 

K*0 



n-3 
K=Q 

5 

where I = 0, 1 ,.», n-1 , and <x) n = X mod n. 
This translates in geometrical terms to the parity rows 
representing the checksums along the diagonals of 
slope 1 and -1, respectively. 

10 [0056] Figure 14A shows how the first parity check 
row 1 41 0 is obtained by assuming that the second parity 
check row 1412 does not exist or is all zeros. This is 
referred to as an imaginary zero row. Checksums are 
formed on all diagonals of slope -1 . In Figure 14A, all of 

15 the triangle shapes are added to form the first parity 
check row 1410. This means that the elements 1414, 
1416, 1418 and 1420 are added to form the parity ele- 
ment 1422. 

[0057] Figure 14B shows an example of calculating 
20 the first parity check row for exemplary single bits. No- 
tice the diagonal elements 1414, 1416, 1418 and 1420 
require addition of 1 +1 +1 +0 leading to a parity of 1 which 
is stored as symbol 1422. 

[0058] The diagonals are continued in an adjoining 
25 row once reaching the outer edge of the array. For ex- 
ample, the diagonal row 1430 including elements 1432, 
1434, 1436 and 1438 is continued beginning at the top 
of the next row as 1440. The parity symbol 1436 corre- 
sponds to an addition of the symbols 1432 : 1434, 1438 
30 and 1440. Figure 14B shows these symbols corre- 
sponding to 0+0+0+1 which equals 1. The value 1 is 
stored as symbol 1436. 

[0059] The second parity check row is formed from a 
diagonal of slope +1 . Figure 14C shows this analogous 

35 second parity row calculation with Figure 14D showing 
a specific example. The row 1440 includes symbols 
1442, 1444, 1446 : 1448 and 1450. Parity symbol 1450 
is calculated as 1442+ 1444 + 1448+1446. Figure 14D 
shows a concrete example where the parity 0 is ob- 

40 tained from a sum of +0+0+1 =1_. 

[0060] Figure 14E shows the complete code word 
formed by combining the two parity check rows. The two 
parity check rows are obtained totally independent of 
one another. Each information symbol appears exactly 

45 once in each parity row. All parity symbols depend only 
on information symbols from other columns (other disks) 
and not on each other. Therefore, an update of one in- 
formation symbol results in an update of only two parity 
symbols. 

so [0061] X-code as described above uses a prime 
number n allowing for real diagonal computation. If n is 
not prime, however, a different line of computation can 
be used. For example, any suitable given envelope 
which traverses ail of the n-1 disks can be used accord- 

55 jng to X-Code. All of the lines are preferably parallel. 
[0062] As described above, X-Code has a column dis- 
tance of three allowing correction of two column eras- 
ures or one column error. An erasure is when there is a 
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problem and it is known which area has the problem. An 
error occurs when the specific source of the problem is 
unknown. The decoding operation can be used without 
requiring finite field operations, using only cyclic shift 
and exclusive OR. 

[0063] Correction of one erasure can simply recover 
that erasure along the diagonals of slope I or -1 using 
either of the parity rows. 

[0064] In an array of size N by n, assume the two col- 
umns are erasures. In this case, the basic unknown 
symbols of the two columns are the information symbols 
in those columns. Since each of the columns has (n-2) 
information symbols, the number of unknown symbols 
become 2 x (n-2). Similarly, the remaining array includes 
2 x n-2 parity symbols, including alt of the 2 x (n-2) un- 
known symbols. Hence : the erasure correction be- 
comes a problem of solving 2 x (n-2) unknowns from 2 
x (n-2) linear equations. Since these linear equations 
are linearly independent, these linear equations be- 
come solvable. 

[0065] Moreover, no two information symbols of this 
code in the same column can appear in the same parity 
symbol. Therefore, each equation has at most two un- 
known symbols. Some equations have only one un- 
known symbol. This will drastically decrease the com- 
plexity of equation solving. The system used according 
to this system starts with any equation with one known 
unknown symbol. Solving forthose equations is relative- 
ly simple. The process continues to solve for the other 
unknown solutions until all equations are solved. 
[0066] Suppose the erasure columns are the ith and 
jth (0 < I < j < n-1 ) columns. Since each diagonal travers- 
es only n - 1 columns, if a diagonal crosses a column at 
the last row, no symbols of that column are included in 
this diagonal. This determines the position of the parity 
symbol including only one symbol of the two erasure col- 
umns. The symbol can be recovered from the simple 
checksum along this diagonal. 

[0067] First consider the diagonals of slope 1 . Sup- 
pose the xth symbol of the ith column is the only un- 
known symbol in a diagonal. Then, this diagonal hits the 
jth column at the (n-1 )th row, and hits the first parity row 
at the yth column, i.e., the three points (x,i), (n - 1 j) and 
(n -2,y) are on the same diagonal slope 1 , thus the fol- 
lowing equation holds: 

(n-l)-x=j~i mod/z 
(n-l}-x=j-i modrt 

(n-1) - (n-2) = j-y mod n 

Since 1<j - 1 <n-1, andO<j - 1 <n -1, the solutions for 
x and y are 



x = <(/>-1) -(/-/>„ = -(/-'■> 

5 

So, the parity symbol C^j-i allows calculation of the 
symbol 0^^^^ in the ith column. Similarly, the sym- 
bol eg.,).! | in the jth column can be solved directly from 
the parity symbol C^,.^. 
10 [0068] Symmetrically with the diagonals of slope -1 , 
the symbol Cq.,^ j in the ith column can be solved from 
the parity symbol C^^^, and the symbol C (n _ 1HH) j 
in the jth column can be solved from the parity symbol 

is [0069] Notice that an information symbol is crossed 
by the diagonals of slope 1 and -1 exactly once, respec- 
tively. If an unknown symbol is solved along a diagonal 
of slope 1 (or -1 ), then the parity symbol along the diag- 
onal of slope -1 (or 1 ) which crosses the solved symbol, 

20 another unknown symbol in the other column can be 
solved. This procedure can be used recursively until the 
parity symbol is an erasure column or the solved symbol 
itself is a parity symbol. These same techniques can be 
used to recover any desired unknown symbol or sym- 

25 bols. 

[0070] The preferred system uses N = n or N being 
prime. Systems such as Figs 5 and 6, (n=4; k=2) can 
also be used as described above. 

30 Distributed Read/Write 

[0071] The system allows a new kind of operation by 
its use of a distributed read and write system. 
[0072] The redundant storage of information allows 
35 the system to read from all n of the nodes to maximize 
the bandwidth of the system. In that case, the system is 
reading only from the raw information parts 1502 of the 
nodes. 

[0073] Alternatively, only k of the nodes are read, but 

40 those »c are read along with their parity portions 1504. 
Unlike the conventional "correcting", this system selects 
which of the available clusters will be used, based on 
the system's view of the state of the network different 
parts could be used for different codes, e.g., the even/ 

45 odd code. 

[0074] Distributed write involves writing to all effecting 
nodes each time information is changed. However, the 
update is maintained to be as small as possible. The 
MDS code guarantees redundancy and makes the up- 

so date optimally minimum and efficient. Average unit par- 
ity update number represents the average number of 
parity bits that is effected when a change of a single in- 
formation bit occurs in the codes. The parameter be- 
comes particularly crucial when array codes are used in 

55 storage applications. X-code is optimal in the sense that 
each single information bit change requires an update 
of only two parity bits. 

[0075] Another important feature of X-code follows 
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from its formation of independent parity bits. Many of the 
codes, which have been previously used, rely on de- 
pendent parity columns in order to form code distances 
of three. Since the parities are dependent on one anoth- 
er, the calculation of these parities can be extremely 
complicated. This often leads to a situation where the 
average unit parity update number of the code increases 
linearly with the number of columns of the array 
[0076] Systems such as the EVENODD code de- 
scribed in U.S. Patent No. 5,579 ; 475 and other similar 
systems use independent parity columns to make the 
information update more efficient. 

Detection 

[0077] The distributed data storage system spreads 
the server function across the nodes. This is done ac- 
cording to the present system using a special commu- 
nication layer running on each of the multiple nodes 
which is transparent to the application. A special distrib- 
uted read system and distributed write system also 
maintains the robust operation of the system. 
[0078] The communication architecture of the pre- 
ferred system is shown in Figure 7. The actual commu- 
nication and network interfaces are shown as elements 
700. The communication can be done in any conven- 
tional manner, including Ethernet, Myrinet, ATM Server- 
net, or any other conventional schemes of communica- 
tion. These conventional network interfaces are control- 
led by the redundant communication layer. 
[0079] The communication is monitored by the net 
monitor ("NETM") protocol system 702. NETM main- 
tains a connectivity protocol which determines channel 
state and history of the channel state at each node. More 
specifically, NETM monitors all connections from the lo- 
cal node on which NETM is running, to each remote 
node, over each connection path from the local node to 
the remote node. NETM maintains a connectivity chart 
which includes an indication of the status of all of the 
possible connections from the local node to each remote 
node at all times. 

[0080] The actual communication is controlled by the 
reliable user data protocol ("RUDP"). RUDP operates 
based on a request to communicate from the local node 
("node A") to some other node ("node B"). RUDP then 
obtains connectivity information about properly-operat- 
ing communications paths from node A to node B from 
NETM. RUDP selects a communication path using the 
information gathered by NETM, and sends the informa- 
tion using bundled interfaces. RUDP also properly pack- 
ages the information using known protocol systems, to 
provide in-order confirmed delivery. 
[0081] NETM system runs on each node of the sys- 
tem to find information about the system. NETM sees 
the node on which it is running as the local node. NETM 
uses system clues to determine the state of the connec- 
tion between the local node and all other nodes in the 
system. 
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[0082] Since the same protocol is running on all 
nodes, each NETM process on each node will deter- 
mine the same condition for any given A to B connection 
state. NETM also uses a history checking mechanism, 
5 such that ail nodes see the same history of channel state 
over time. 

[0083] The preferred system clues are obtained from 
messages that are sent from node A to each other node 
in the system, over each possible path to the other node. 

10 These messages are called "heartbeats". NETM sends 
a message from the local node ("node A") to each re- 
mote node ("node B") over each pathway. Each connec- 
tion is characterized by three items of information called 
the CiJ.k "tuple" including l= local interface; j= remote 

'5 node and k = remote interface. This tuple defines an un- 
ambiguous path. 

[0084] NETM uses the heartbeats to determine if 
there is an operational communication link between A 
and B over each pathway Ci,j,k. Since the NETM proto- 

20 col is also running on node B, that remote NETM will 
likely make the same decision about the state of con- 
nectivity from node B to A over pathway Ci : j,k. 
[0085] Certain faults, such as, for example, a buffer 
overflow, might cause a loss of channel in only one di- 

25 rection. The connection protocol uses a token passing 
system to make the history of the channel symmetrical. 
[0086] The history detection is based on a pool of 
hints about the operability of the connection. The heart- 
beat is the preferred hint, and is described herein in fur- 

30 ther detail. Another hint, for example, is a fault indication 
from the communication hardware, e.g., from the Myri- 
net card. If the Myrinet card that is controlling the com- 
munication on path X indicates that it is inoperable, the 
protocol can assume that path to be inoperable. 

35 [0087] The pool of hints is used to set the state of a 
variable which assesses the state of the communication 
path A to B over X. That variable has the value U for up 
and D for down. 

[0088] The operation is shown in the summary flow- 

40 chart of Figure 8. The Figure 8 embodiment uses a 
heartbeat message formed from an unreliable mes- 
sage. A reliable messaging system requires the sending 
node to receive confirmation of receipt of a message. 
The sending node will continue to send the message 

45 until some confirmation of receipt of the message is ob- 
tained by the sending node. In contrast, the Figure 8 sys- 
tem uses unreliable messaging: that is, the message is 
simply sent. No confirmation of receipt is obtained. 
[0089] The message 800 is sent as an unreliable 

so package message to node B. The heartbeat is prefera- 
bly sent every 10ms. The system waits and checks net- 
work hints at step 802 to assess the state and history of 
the network link. The heartbeat can be any message 
that is sent from one node to the other node. 

55 [0090] Since the same protocol is running on each 
node, each node knows that it should receive a heart- 
beat from each other node each 10 ms. Each NETM 
runs a timer which is reset each time that NETM re- 
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ceives a heartbeat from the other node. If the timer ex- 
pires without receiving a heartbeat from the other node, 
then the judgement can be made that there is a problem 
with the connection. 

[0091] Each side also tries to ensure that it sees the 
same history over time. This is carried out by passing 
reliable tokens between the pair of nodes constituting 
the point to point protocol. Each token indicates that the 
node has seen an event. When the token is received by 
the other node, it, too should have seen a comparable 
event and sent a token. Each side passes a token when 
it sees the event. This maintains the history on both 
sides as being the same. 

[0092] Each side has a finite number of tokens that 
can be passed. This has the effect of limiting the number 
of events that can occur before the event is acknowl- 
edged by the other node. For example, if there are two 
tokens per side initially, then the node only has two to- 
kens to pass. After each perceived change in channel 
state, a token is passed. If no token arrives from the oth- 
er side, the node will run out of tokens after these two 
perceived changes. This means that each node can only 
be two events or actions ahead of (or behind) the other 
node. The token passing limits the number of degrees 
of freedom between the two nodes - how far apart the 
two nodes can be before one holds the reported state- 
of the channel as down waiting for the other side to catch 
up. 

[0093] Another way of looking at this is that the tokens 
set the maximum number of transitions that one node 
can make before hearing that the other node has acted 
similarly. 

[0094] The preferred embodiment of the NETM sys- 
tem is illustrated in the connectivity protocol state ma- 
chine of Figure 9 and the flowchart of Figures 10A and 
1 0B. Step 1 000 comprises an initial step of forming the 
Ci.j,k 3-tuple comprising the local interface ID, the re- 
mote machine ID and remote interface ID for each pos- 
sible physical channel from the node to all other known 
nodes. The process ConnP(Cj,j,k) is run for all C h j,k 
3-tuples to determine the connectivity state for each of 
these channels. This creates a data structure called 
ConnectedfCjJ.k) that stores a Boolean value indicating 
the up/down (1 or 0) status for each C t channel. 
[0095] Step 1 002 determines whether there has been 
a ConnP (C f ,j,k) event. If not, there is nothing to do, so 
the process returns. 

[0096] If there is an event detected at step 1 002, flow 
then proceeds to step 1004 which determines if the 
event is a system up event. If so, the result returns a "1". 
If not, the result returns a "0". 

[0097] The link status flowchart of Figure 1 0B uses a 
count of "tokens" as evidence of the operation of the oth- 
er endpoint system. 

[0098] Atstep 1010, the process begins with the token 
count (T) being set to its initial value n>2. The system 
starts with its state initially being up ("1") at step 1012. 
Step 1014 detects whether there has been a time-in 



event. A time-in event is caused, for example, by the 
receipt of a heartbeat from the node B. Since the state 
is already up at this point, the detection of a time-in event 
leaves the state up and takes no further action. If there 
5 is not a time-in event at step 1 01 4 : then 1016 determines 
a time-out event caused, e.g., by not receiving an ex- 
pected heartbeat before the timer expired. If not, step 
1 01 8 determines whether a token has been received f 
a token arrival event"). If none of these events have oc- 
10 curred ; control again passes to step 1012 where the 
node continues to monitor whether one of those events 
has occurred. Since the system always has a token at 
that point, there is no need to check for another one. 
[0099] The time-out event at step 1 01 6 means that no 
15 heartbeat has been received from node B over path X, 
so that there is likely a problem with communication to 
node B over path X. Hence, control passes to step 1 020, 
which sends a token to the node B indicating the time 
out event reporting the omission of heartbeats for the 
20 specified time. Since the token has been sent, then to- 
ken count is also decremented at 1 020, This is followed 
by changing the state of ConnP to D at step 1 022. 
[0100] A token arrival event at step 1018 is followed 
by a step of receiving the token at 1024 and increment- 
's ing the token count. If the current token count is less 
than the maximum token value n at 1026, the token 
count is incremented at 1028. Since there is a missing 
token, the transition on the other end is within the allow- . 
able degrees of freedom allowed by the token passing 
30 scheme and the received token brings the two sides 
back in sync. 

[0101] If the token count is not less than N, the token 
count is at its maximum value. The system therefore 
needs to undergo a transition. This is effected by send- 
35 ing a token at 1 030, followed by the system going down, 
indicated by ConnP 0 or D at 1022, This begins the 
down routine processing operation. 
[01 02] The down routine processing operation is anal- 
ogous to the up routine processing operation. A time- 
40 out event is detected at 1 030 which has no effect since 
the system is already down. A time-in event is detected 
at 1 032. This time-in event will allow the system to return 
to the UP state, providing that a token exists to send in 
order to indicate the transition. The routine checks for a 
45 token at step 1 040. If none are available, then no tran- 
sitions can occur, and flow returns to 1022. If a token 
exists to be passed, then it is passed at 1042, and the 
token count is decremented. The ConnP variable re- 
turns to its UP state, and begins the token processing 
50 routine. 

[0103] Each system of node A to node B over path X 
is characterized in this way by the NETM protocol. 
[0104] The applications run on top of RUDP, For ex- 
ample, an application with a process ID first identifies 
55 itself to the system. For example, the application may 
send a message identifying itself as process 6 and indi- 
cating a desire to send to process 4. This identification 
uses the Ci.j.k tuple described above. NETM deter- 
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mines a communication path for this operation. 
[0105] The actual communication, once determined, 
operates using the so-called sliding window protocol. 
Sliding window is well known and is described, for ex- 
ample, in U.S. Patent Number 5,307,351. Sliding win- 
dow supervises a reliable messaging scheme by appro- 
priate packaging of the data packet. Sliding window es- 
sentially manages sequence numbers and acknowledg- 
es. The data is sent as a reliable packet., requiring the 
recipient to acknowledge receipt before more that one 
window will be sent out. Once the receipt is properly ac- 
knowledged, the window of information "slides" to the 
next unacknowledged packet of information. 
[0106] RUDP uses the sliding window module to per- 
form the actual communication. RUDP also calls NETM 
to provide a valid information path. If more than one of 
the paths between nodes is usable, then RUDP cycles 
between the usable paths. 

[0107] RUDP also acts as a logical network by recon- 
figuring the system using the information provided by 
NETM. 

[0108] The basic RUDP flowchart is shown in Figure 
11. The operation starts with a determination of a re- 
ceive event at step 1100. If no receive event is received 
at step 1100, step 1102 determines if there has been a 
send event. If not, LNET has nothing to do, and flow re- 
turns to continue checking for events. 
[0109] If a receive event is detected at step 1 1 00, flow 
passes to step 1 1 1 0 which determines whether the data 
is indicative of some C j( j,k tuple. If not, an error is deter- 
mined at step 1112. 

[0110] If proper data is obtained, that data is received 
at step 1114 and then returned to the system at step 
1116. 

[0111] A send event requires the Cj,j : k arguments in- 
dicating the data to be sent, and the remote machine to 
receive the event. This requires a determination at 1 1 20 
of whether some up channel Cj,j,k exists for the remote 
machine indicated as one of the arguments of the oper- 
ation. If not, step 1122 declares a lost connection error. 
If, in the more usual case, at least one up channel exists, 
its address is using the arguments of the C it j,k tuple. 
The process then returns at 1130. 
[0112] The process 1120 uses NETM to look up the 
existing paths from the local machine to the remote ma- 
chine. Therefore, NETM maintains the data structure 
while LNET uses the data structure. 

INFORMATION SERVER 

[0113] The system described herein has special ap- 
plication in an information server - i.e. a server that pro- 
vides information to a user on request. The information 
server can be an Internet (web) server, a video server, 
or any other type device where information is provided. 
[0114] The system is used as a server in the sense 
that any node can request any stored information from 
any other node or combination of nodes. For example, 



18 

a request can be made which requires the information 
from 25 different nodes. This system can select the 25 
closest nodes or 25 least-used nodes. This allows the 
system to ignore overloaded nodes just as if they were 
5 faulted. 

[0115] When it is used as a video server, the video 
that is to be delivered might be stored anywhere on the 
system. According to the present scheme, the video is 
stored as distributed information among the different 
10 nodes of the network in a way that allows the video in- 
formation to be retrieved even in the event of specified 
network failures. 

[0116] The server system requests the video to be 
provided from the node that is storing it. The special 
techniques of the system ensure that no specified 
number of failures can interrupt operation of the system 
as a whole. No two node failure, for example can prevent 
obtaining the stored information, since the information 
is redundantly stored at other locations in the network. 
[0117] Another application is as a web server. The 
web server uses the TCP/IP protocol and packeted 
communications to obtain Internet information. Again, 
this information could be stored anywhere within the dis- 
tributed server. No two faults of any kind - communica- 
tion or storage, can prevent the information from being 
obtained. 

[01 1 8] Another application of this system is in expan- 
sion and repair. Any node can be removed at any time, 
and the rest of the system will continue to operate with- 
out interruption. That node could be replaced with a 
blank node, in which case the network will begin writing 
information to the blank column it sees using the redun- 
dancy data. 

[0119] Although only a few embodiments have been 
disclosed in detail above : those having ordinary skill in 
the art will recognize that other embodiments are within 
the disclosed embodiments, and that other techniques 
of carrying out the invention are predictable from the dis- 
closed embodiments. 



Claims 

1 . A redundant distributed network system, compris- 
ing: 

a plurality of system nodes (100, 102, 104, 
106), each of said system nodes including at 
least two communication devices and a storage 
device, said storage device including raw data 
and redundant data indicative of raw data that 
is stored in nodes other than said each node to 
permit reconstruction of stored data from the 
storage information in any one node when com- 
bined with data in any other node; 
a plurality of switching devices (110,11 2), con- 
nected to said communication devices of said 
system nodes in a way such that each of said 
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communication devices in any one system 
node is connected to a different one of said 
switching devices, allowing each of said system 
nodes to communicate to each other of said 
system nodes over one of at least two different 
paths, thereby providing redundant communi- 
cation; 

a detection routine at each node (100, 102, 104, 
106) which detects system functional state 
which may prevent any operation of the net- 
work system; and 

a logical network process (150) which reconfig- 
ures the network using one of at least said com- 
munication device redundancy and said stor- 
age information redundancy as needed to com- 
pensate for the system functional state using 
the network redundancy. 

2. A system as in claim 1 wherein said detection rou- 
tine detects functional states on a (east a plurality 
of said system nodes, with an identical protocol be- 
ing run by each said detection routine on each of 
said system nodes. 

3. A system as in claim 1 , wherein said system func- 
tional state include faults in network communication 
or faults in memory storage or any other kind of fault 
which produces an undesired result. 

4. A system as in claim 3, wherein when said system 
functional state includes a fault in network commu- 
nication, said logical network process commands a 
connection to be changed to a different connection. 

5. A system as in claim 4, wherein when said system 
functional state includes a.fault in memory storage, 
said logical network process commands desired in- 
formation to be obtained from said redundant data 
storage, 

6. A system as in claim 1 , wherein said detection rou- 
tine (702) operates in each node (100, 102, 104, 
1 06) to view a state of connection to other nodes in 
the network. 

7. A system as in claim 6, wherein said detection rou- 
tine (702) operates to determine said state of con- 
nection using hints about a condition of the network. 

8. A system as in claim 7, wherein said hints include 
a heartbeat signal which is produced by each said 
node at specified intervals, and said detection rou- 
tine operates to receive said heartbeat signal and 
to detect a presence or absence of said heartbeat 
signal as one of said hints. 

9. A system as in claim 7, further comprising a token 
passing system, wherein each node (100, 102, 104, 



106) determines events in a monitored node over a 
monitored channel and passes a token to said mon- 
itored node over said monitored channel to indicate 
said event, wherein said monitored node passes 

5 back said token to indicate operation based on said 
event, and wherein each node has only a specified 
number of tokens to limit a number of events which 
can occur on one of said nodes without a corre- 
sponding event occurring on the other of said 

w nodes. 

10. A system as in claim 6, further comprising means 
for guaranteeing that each said node sees a same 
history of the network. 

15 

11. A system as in claim 1 , wherein said connection is 
made such that no groups of computing nodes can 
be isolated. 

20 12. Asystem as in claim 1, wherein said switches (110, 
112) connect said nodes in the most non-local way 
possible. 

1 3. A system as in claim 1 2, wherein said switches con- 
25 nect between two nodes which are farthest from 

one another. 

14. A system as in claim 1 3, wherein said connections 
are made such that no failure of any two nodes can 

30 isolate any group of nodes from communicating 
with any other group of nodes. 

15. A system as in claim 1, wherein each node (100, 
102, 104, 106) is connected with each other node 

35 by at least two paths, 

and further comprising a network monitor 
(702) running at each said node and monitoring alt 
connections from a local node on which said net- 
work monitor is running to each remote node over 
40 each connection path from the local node the re- 
mote node. 

16. A system as in claim 15, further comprising a relia- 
ble user data protocol running on said local node, 

45 and receiving a request to communicate from the 
local node to some other node, and determining a 
path from said network monitor process. 

17. A system as in claim 16, further comprising recon- 
50 figuring a path of said communicating using a logi- 
cal network interconnection that allows changing a 
physical connection between the nodes to a differ- 
ent node connection. 

55 1 8. A system as in claim 1 , further comprising a network 
monitor (702), determining operational connections 
among said nodes, a reliable user protocol, which 
processes information for the running nodes, and a 
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logical network which reconfigures the communica- 
tions based on said operational connections. 

19. A system as in claim 1 , wherein said storage device 
stores only a part of the information on each disk of 
each node. 

20. A system as in claim 1 9 : wherein each disk of each 
node also stores information indicating some prop- 
erty of information on other disks. 

21 . A server as in claim 1 , wherein said stored informa- 
tion on each node stores only part, but not all of any 
desired information, and wherein no two nodes 
store the same information. 

22. A server as in claim 21 , wherein said stored infor- 
mation includes an information portion, and a re- 
dundancy portion, said redundancy portion being 
information indicative of information portions for 
other nodes only. 

23. A server as in claim 22, wherein said redundancy 
portion is formed from an array code where a plu- 
rality of said nodes are arranged into an array to 
form said information portion, and said redundancy 
portion are formed from checksums along diago- 
nals of said array 

24. A method of operating a network to provide redun- 
dancy comprising: 

executing a controlling process that carries out 
a distributed read from a plurality of system 
nodes (100, 102, 104 : 106) ; said system nodes 
including at least two communication devices 
each of said communication devices in any one 
system node is connected to a different one of 
a plurality of switching devices (110,112), allow- 
ing each of said system nodes to communicate 
to each other of said system nodes over one of 
at least two different paths, thereby providing 
redundant communication: collectively storing 
system data, each node storing raw data, and 
redundant data indicative of raw data that is 
stored in nodes other than said each node to 
permit reconstruction of stored data from the 
storage information in any one node when com- 
bined with data in any other node, each node 
including a detection routine (702) that detects 
system functional state that may prevent any 
operation of the network system; and 
performing a distributed read comprising the 
steps of 

determining a parameter related to availa- 
bility of system nodes, and 
reading said raw data from said plurality of 



system nodes if said parameter indicates 
availability, and reading both said raw data 
and said redundant data from less than 
said plurality of system nodes if said pa- 
5 rameter indicates less than availability. 

25. A method as defined in claim 24 : further comprising: 

storing raw information and redundant informa- 
10 tion indicating the error correcting code into a 

plurality of information nodes; 
determining a parameter indicating usability of 
said information nodes; 
reading said raw information from said plurality 
is of nodes if said parameter indicates that said 

plurality of nodes are usable, and reading both 
said raw data and said redundant data from 
less than said plurality of nodes if said param- 
eter indicates that at least a portion of said plu- 
20 raltty of nodes are less than usable. 

26. A method as defined in claim 24, further comprising: 

forming an array of information, by forming 
25 each column of the array representing informa- 

tion from a node : 

forming a raw portion of each column including 
raw information indicating data, 
forming a redundant information indicating re- 
30 dundancy information, said redundancy infor- 

mation indicating information about other 
nodes besides said each node, as taken along 
an envelope of a specified shape that obtains 
information from said other node. 

35 

27. A method as in claim 26, wherein said envelope is 
a diagonal which is extended to other nodes beyond 
edges of said array. 

40 28. A method as defined in claim 24, comprising: 

mapping each node to a column of an array; 
forming two rows of redundant information from 
said columns of the array, and placing said two ' 
45 rows into said columns, to form a resultant array 

of N by N including N-2 by N information sym- 
bols, and 2 by N redundant information sym- 
bols, said parity symbols being constructed ac- 
cording to: 

50 

n-3 
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where I = 0, 1 ,—, n-1 ( and <x> n = X mod n. 

29. A network system as defined in claim 1 , wherein the 
storage devices store video information. 



Patentanspruche 

1. Redundantes, verteiltes Netzsystem, mit: 

w 

mehreren Systemknoten (100, 102, 104, 106), 
wovon jeder wenigstens zwei Kommunikati- 
onsvorrichtungen und eine Speichervorrich- 
tung umfaBt, wobei die Speichervorrichtung 
Rohdaten und redundante Daten enthalt, die *5 
Rohdaten angeben, die in von dem betrachte- 
ten Knoten verschiedenen Knoten gespeichert 
sind ! urn eine Rekonstruktion gespeicherter 
Daten aus den Speicherinformationen in ir- 
gendeinem Knoten zu ermoglichen, wenn die- 20 
se mit Daten in irgendeinem anderen Knoten 
kombiniert werden; 

mehreren Schaltvorrichtungen (110, 112), die 
mit den Kommunikationsvorrichtungen der Sy- 
stemknoten in einer Weise verbunden sind, in 25 
der jede der Kommunikationsvorrichtungen in 
irgendeinem Systemknoten mit einer anderen 
der Schaltvorrichtungen verbunden ist, wo- 
durch jeder der Systemknoten mit jedem ande- 
ren der Systemknoten uber wenigstens zwei 30 
verschiedene Wege kommunizieren kann, wo- 
durch eine redundante Kommunikation ge- 
schaffen wird; 

einer Erfassungsroutine in jedem Knoten (100, 
102.. 104, 106), die einen Systemfunktionszu- 35 
stand erfaBt, der jegliche Operation des Netz- 
systems verhindern konnte; und 
einem Logiknetzproze3 (150), der das Netz un- 
ter Verwendung der Kommunikationsvorrich- 
tungsredundanz und/oder der Speicherinfor- 40 
mationsredundanz je nach Bedarf rekonfigu- 
riert, um den Systemfunktionszustand unter 
Verwendung der Netzredundanz zu kompen- 
sieren. 

45 

2. System nach Anspruch 1 , wobei die Erfassungsrou- 
tine Funktionszustande wenigstens in mehreren 
der Systemknoten erfaBt, wobei jede Erfassungs- 
routine in jedem der Systemknoten ein identisches 
Protokoll ablaufen iaBt. 50 

3. System nach Anspruch 1 , wobei der Systemfunkti- 
onszustand Fehler in der Netz kommunikation oder 
Fehler in der Speicherung oder irgendeine andere 

Art von Fehler, der ein unerwunschtes Ergebnis er- 55 
zeugt, umfaBt. 

4. System nach Anspruch 3, wobei der Logiknetzpro- 



zeB dann, wenn der Systemfunktionszustand einen 
Fehter in der Netzkommunikation enthalt r eine An- 
derung von einer Verbindung zu einer anderen Ver- 
bindung befiehlt. 

5. System nach Anspruch 4, wobei der Logiknetzpro- 
zeB dann, wenn der Systemfunktionszustand einen 
Fehler in der Speicherung enthalt, befiehlt, er- 
wunschte Informationen aus der Speicherung red- 
undanter Daten zu erhalten. 

6. System nach Anspruch 1 . wobei die Erfassungsrou- 
tine (702) in jedem Knoten (100, 102, 104, 106) der- 
ail arbeitet, daB sie einen Verbindungszustand mit 
anderen Knoten im Netz beobachtet. 

7. System nach Anspruch 6, wobei die Erfassungsrou- 
tine (702) derart arbeitet, daB sie den Verbindungs- 
zustand unter Verwendung von Hinweisen uber ei- 
nen Zustand des Netzes bestimmt. 

8. System nach Anspruch 7, wobei die Hinwetse ein 
Herzschlag-Signal enthalten, das von jedem Kno- 
ten in spezifischen Intervallen erzeugt wird, und die 
Erfassungsroutine derart arbeitet, daB sie das 
Herzschlag-Signal empfangt und das Vorhanden- 
sein oder Fehlen des Herzschlag-Signals als einen 
der Hinweise erfaBt. 

9. System nach Anspruch 7, ferner mit einem Token- 
Durchlauf system, wobei jeder Knoten (100, 102, 
1 04, 1 06) Ereignisse in einem iiberwachten Knoten 
uber einen iiberwachten Kanal bestimmt und ein 
Token an den iiberwachten Knoten uber den iiber- 
wachten Kanal schickt, um das Ereignis anzuge- 
ben, wobei der iiberwachte Knoten das Token zu- 
ruckschickt, um eine Operation auf der Grundlage 
des Ereignisses anzugeben, und wobei jeder Kno- 
ten nur eine bestimmte Anzahl von Tokens besitzt, 
um die Anzahl von Ereignissen zu begrenzen, die 
in einem der Knoten auftreten konnen, ohne daB 
ein entsprechendes Ereignis in dem anderen der 
Knoten auftritt. 

10. System nach Anspruch 6, ferner mit einer Einrich- 
tung, die sicherstellt, daB jeder der Knoten die glei- 
che Historie des Netzes sieht. 

11. System nach Anspruch 1 , wobei die Verbindung in 
der Weise ausgebildet wird, daB keine Gruppen von 
Rechenknoten isoliert werden konnen. 

12. System nach Anspruch 1, wobei die Schalter (110, 
112) die Knoten in einer Weise verbinden, die so 
wenig lokal wie moglich ist. 

1 3. System nach Anspruch 1 2, wobei die Schalter zwei 
Knoten verbinden, die maximal voneinander ent- 
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fernt sind. 

14. System nach Anspruch 1 3, wobei die Verbindungen 
in der Weise erfoigen : daB kein Fehtervon zwei be- 
tiebigen Knoten irgendeine Gruppe von Knoten an 
einer Kommunikation mil irgendeiner anderen 
Gruppe von Knoten hindern kann. 

15. System nach Anspruch 1 , wobei jeder Knoten (1 00, 
102, 104, 106)mitjedem anderen Knoten durchwe- 
nigstens zwei Wege verbunden ist, 

und ferner mit einem Netzmonitor (702), der 
in jedem Knoten lauft und samtliche Verbindungen 
von einem lokalen Knoten, in dem der Netzmonitor 
lauft, zu jedem entfernten Knoten uber jeden Ver- 
bindungsweg von dem lokalen Knoten zum entfern- 
ten Knoten uberwacht. 

16. System nach Anspruch 15, ferner mit einem zuver- 
lassigen Anwenderdatenprotokol!, das in lokalen 
Knoten lauft und eine Anforderung fur eine Kommu- 
nikation von dem lokalen Knoten zu irgendeinem 
anderen Knoten empfangt und einen Weg von dem 
NetzuberwachungsprozeB bestimmt. 

17. System nach Anspruch 16, ferner umfassend die 
Rekonfiguration eines Kommunikationsweges un- 
ter Verwendung einer logischen Netzverbindung, 
die den Wechsel von einer physikalischen Verbin- 
dung zwischen den Knoten zu einer anderen Kno- 
tenverbindung ermdglicht. 

1 8. System nach Anspruch 1 , ferner mit einem Netzmo- 
nitor (702), der funktionale Verbindungen zwischen 
den Knoten bestimmt einem zuverlassigen Anwen- 
derprotokoll, das Informationen fur die laufenden 
Knoten verarbeitet, und einem Logiknetz, das die 
Kommunikationen auf der Grundlage der funktiona- 
len Verbindungen rekonfiguriert. 

19. System nach Anspruch 1, wobei die Speichervor- 
richtung nur einen Teil der Informationen auf jeder 
Platte jedes Knotens speichert. 

20. System nach Anspruch 1 9, wobei jede Platte jedes 
Knotens auBerdem Informationen speichert, die ir- 
gendeine Eigenschaft von Informationen auf ande- 
ren Platten angeben. 

21. Server nach Anspruch 1, wobei die gespeicherten 
Informationen in jedem Knoten nur einen Teil, nicht 
jedoch alle beliebigen gewiinschten Informationen 
speichern, und wobei keine zwei Knoten die glei- 
chen Informationen speichern. 

22. Server nach Anspruch 21 , wobei die gespeicherten 
Informationen einen Informationsabschnitt und ei- 
nen Redundanzabschnitt enthaften, wobei der Red- 
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undanzabschnitt Informationen darstellt, die Infor- 
mationsabschnitte lediglich fur andere Knoten an- 
geben. 

5 23. Server nach Anspruch 22, wobei der Redundanz- 
abschnitt aus einem Matrixcode gebildet ist, wobei 
mehrere der Knoten als Matrix angeordnet sind, um 
den Informationsabschnitt zu bilden, und der Red- 
undanzabschnitt aus Prufsummen langs der Diago- 

10 nalen der Matrix gebildet ist. 

24. Verfahren zum Betreiben eines Netzes, um eine 
Redundanz zu schaffen, umfassend: 

is Ausfuhren eines Steuerprozesses, der ein ver- 

teiftes Lesen aus mehreren Systemknoten 
(100, 102, 104 : 106) ausfuhrt, wobei die Sy- 
stemknoten wenigstens zwei Kommunikations- 
vorrichtungen enthalten, wovon jede in irgend- 
20 einem Systemknoten mit einer anderen von 

mehreren Schaltvorrichtungen (110, 112) ver- 
bunden ist, wodurch jeder der Systemknoten 
mit jedem anderen der Systemknoten uber ei- 
nen von wenigstens zwei verschiedenen We- 
25 gen kommunizieren kann, wodurch eine redun- 

dante Kommunikation geschaffen wird, kollek- 
tives Speichern von Systemdaten, wobei jeder 
Knoten Rohdaten und redundante Daten ent- 
haft, die Rohdaten angeben, die in Knoten ge- 
30 speichert sind, die von dem betrachteten Kno- 

ten verschieden sind, um eine Rekonstruktion 
gespeicherter Daten aus den Speicherinforma- 
tionen in irgendeinem Knoten zu ermoglichen, 
wenn sie mit Daten in irgendeinem anderen 
35 Knoten kombiniert werden ; wobei jeder Knoten 

eine Erfassungsroutine (702) enthalt, die einen 
Systemfunktionszustand erfaBL der jeglichen 
Betrieb des Netzsystems verhindern konnte; 
und 

40 Ausfuhren eines verteilten Lesens, das die fol- 

genden Schritte umfaBt: 

Bestimmen eines Parameters, der mit der 
Verfiigbarkeit von Systemknoten in Bezie- 
45 hung stent, und 

Lesen der Rohdaten aus den mehreren 
Systemknoten, falls der Parameter die Ver- 
fiigbarkeit angibt, und Lesen sowohl der 
Rohdaten als auch der redundanten Daten 
so aus weniger als den mehreren Systemkno- 

ten, falls der Parameter weniger als die 
Verfugbarkeit angibt. 

25. Verfahren nach Anspruch 24, ferner umfassend: 

55 

Speichern von Rohinformationen und von red- 
undanten Informationen, die den Fehlerkorrek- 
turcode angeben, in mehreren Informations- 
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knoten; 

Bestimmen eines Parameters; der die Brauch- 
barkeit der Informationsknoten angibt; 
Lesen der Rohinformationen aus den mehre- 
ren Knoten.. falls der Parameter angibt, daft die 
mehreren Knoten brauchbar sind, und Lesen 
sowohl der Rohdaten ais auch der redundanten 
Daten aus weniger als den mehreren Knoten, 
falls der Parameter angibt, daft wenigstens ein 
Teil der mehreren Knoten weniger als brauch- 
bar ist. 

26. Verfahren nach Anspruch 24, femer umfassend: 

Bilden einer Matrix aus Informationen durch Bil- 
den jeder Spalte der Matrix, die Informationen 
von einem Knoten reprasentiert, 
Bilden eines Rohabschnitts jeder Spalte, der 
Rohinformationen enthalt, die Daten angeben, 
Bilden redundanter Informationen, die Redun- 
danzinformationen angeben, wobei die Redun- 
danzinformationen Informationen uber andere 
Knoten aufter dem betrachteten Knoten langs 
einer Hulle einer bestimmten Form, die infor- 
mationen von den anderen Knoten erhalt, an- 
geben. 

27. Verfahren nach Anspruch 26, wobei die Hulle eine 
Diagonale ist, die sich uber die Kanten der Matrix 
hinaus zu anderen Knoten erstreckt. 

28. Verfahren nach Anspruch 24, umfassend: 

Abbilden jedes Knotens auf eine Spalte einer 
Matrix; 

Bilden zweier Zeilen aus redundanten Infor- 
mationen aus den Spalten der Matrix und Anordnen 
der beiden Zeilen in den Spalten, um eine resultie- 
rende N x N-Matrix zu bilden, die (N - 2) x N Infor- 
mationssymbole und 2 x N redundante Informati- 
onssymbole enthalt, wobei die Paritatssymbole ge- 
maft 



n-3 

C n-2,i = S C k,(i+kf2) n 
k=0 N ' n 



n-3 

c n-l,i = 2 C k,(i-k-2V 
k=0 N /n 



konstruiert werden, wobei i = 0, 1 n - 1 und (x) n 

= X mod n. 

29. Netzsystem nach Anspruch 1 , wobei die Speicher- 
vorrichtungeh Videoinformationen speichern. 



Revendications 

1. Systeme de reseau distribue redondant : compor- 
tant 

5 

une plurality de noeuds systeme (100, 102, 
104, 106), chacun desdits noeuds systeme in- 
cluant au moins deux dispositifs de communi- 
cation et un dispositif de memorisation, ledit 
dispositif de memorisation incluant des don- 
nees brutes et des donnees redondantes indi- 
catives de donnees brutes qui sont memori- 
sees dans des noeuds autres que ledit chaque 
noeud pour permettre la reconstruction des 

'5 donnees memorisees a partir des informations 

de memorisation dans un noeud quelconque 
lorsque combinees avec les donnees conte- 
nues dans un autre noeud, 
une pluralite de dispositifs de commutation 

20 (110, 112), connectes auxdits dispositifs de 

communication desdits noeuds systeme de 
maniere a ce que chacun desdits dispositifs de 
communication dans un noeud systeme soit 
connecte a un dispositif different parmi lesdits 

25 dispositifs de commutation, permettant a cha- 

cun desdits noeuds systeme de communiquer 
avec chaque autre desdits noeuds systeme via 
Tun parmi au moins deux trajets differents, four- 
nissant ainsi une communication redondante, 

30 une routine de detection au niveau de chaque 

noeud (100, 102, 104 : 106) qui detecte I'etat 
fonctionnel du systeme qui peut empecher une 
operation du systeme reseau, et 
un processus de reseau logique (150) qui re- 

35 configure le reseau en utilisant Tune parmi au 

moins ladite redondance de dispositif de com- 
munication et ladite redondance d'informations 
de memorisation telles que necessaires pour 
compenserTetat fonctionnel du systeme en uti- 

^0 Hsant la redondance reseau. 

2. Systeme selon ta revendication 1 , dans lequel ladite 
routine de detection detecte les etats fonctionnels 
surau moins une pluralite desdits noeuds systeme, 

45 un protocole identique etant execute par chaque di- 
te routine de detection sur chacun desdits noeuds 
systeme. 

3. Systeme selon la revendication 1 , dans lequel ledit 
so etat fonctionnel du systeme inclut des pannes de 

communication rdseau ou des pannes de memori- 
sation memoire ou tout autre type de pannes qui 
produit un resultat indesire. 

55 4. Systeme selon la revendication 3, dans lequel. lors- 
que ledit etat fonctionnel du systeme inclut une pan- 
ne de communication reseau, ledit processus re- 
seau logique commande de modifier une connexion 
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en une connexion differente. 

5. Systeme selon la revendication 4, dans lequel, lors- 
que ledit etat fonctionnel du systeme inclut une pan- 
ne de memorisation memoire, ledit processus re- $ 
seau logique commande I'obtention des informa- 
tions souhaitees a partir de ladite memorisation de 
donnees redondantes. 

6. Systeme selon Ea revendication 1 , dans lequel ladite '0 
routine de detection (702) opere dans chaque 
noeud (100, 102, 104, 106) pour voir un etat de con- 
nexion aux autres noeuds du reseau. 

7. Systeme selon la revendication 6, dans lequel ladite '5 
routine de detection (702) opere pour determiner le- 
dit etat de connexion en utilisant des indices con- 
cemant une condition du reseau. 

8. Systeme selon la revendication 7, dans lequel les- 20 
dits indices incluent un signal de battement cardia- 
que qui est produit par chaque dit noeud a des in- 
tervalles specifies, et ladite routine de detection 
opere pour recevoir ledit signal de battement car- 
diaque et pour detecter une presence ou une ab- 25 
sence dudit signal de battement cardiaque comme 
Tun desdits indices. 

9. Systeme selon la revendication 7, comportant en 
outre un systeme de passation de jetons, dans le- 30 
quel chaque noeud (100, 102, 104, 106) determine 
des evenements dans un noeud surveille sur un ca- 
nal surveille et passe un jeton audit noeud surveille 

via ledit canal surveille pour indiquer ledit evene- 
ment : dans lequel ledit noeud surveille repasse ledit 35 
jeton pour indiquer une operation basee sur ledit 
evenement, et dans lequel chaque noeud a seule- 
ment un nombre specifie de jetons pour limiter un 
nombre d'evenements qui peuvent survenir sur Tun 
desdits jetons sans qu'un evenement correspon- 40 
dant ne survienne sur I'autre desdits jetons. 

10. Systeme selon la revendication 6, comportant en 
outre des moyens pour garantir que chaque dit je- 
ton voit un meme historique du reseau. *s 

1 1 . Systeme selon la revendication 1 , dans lequel ladite 
connexion est faite de maniere a ce qu'aucun grou- 
pe de noeuds de calcul ne puisse etre isole>. 

50 

12. Systeme selon la revendication 1 , dans lequel les- 
dits commutateurs (110, 112) connectent lesdits 
noeuds de la maniere la moins locale possible. 

1 3. Systeme selon la revendication 1 2, dans lequel les- ss 
dits commutateurs connectent deux noeuds aussi 
eloignes que possible. 
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14. Systeme selon la revendication 13. dans lequel les- 
dites connexions sont faites de maniere a ce 
qu'aucune panne, sur deux noeuds, ne puisse em* 
pecher un groupe de noeuds de communiquer avec 
un autre groupe de noeuds. 

15. Systeme selon la revendication 1 , dans lequel cha- 
que noeud (1 00, 1 02, 1 04, 1 06) est connecte a cha- 
que autre noeud via au moins deux trajets, 

et comportant en outre un moniteur reseau 
(702) fonctionnant au niveau de chaque dit noeud 
et survetllant toutes les connexions entre un noeud 
local sur lequel ledit moniteur reseau fonctionne et 
chaque noeud eloigne via chaque trajet de con- 
nexion entre le noeud local et le noeud eloigne. 

16. Systeme selon la revendication 15, comportant en 
outre un protocole de donnees utilisateur fiable 
fonctionnant sur ledit noeud local, et recevant une 
demande pour communiquer a partir du noeud local 
vers un autre noeud, et determiner un trajet a partir 
dudit processus moniteur reseau. 

17. Systeme selon la revendication 16, comportant en 
outre la reconfiguration d'un trajet de ladite commu- 
nication en utilisant une interconnexion de reseau 
logique qui permet de changer une connexion phy- 
sique entre les noeuds en une connexion noeud dif- 
ferente. 

18. Systeme selon la revendication 1, comportant en 
outre un moniteur reseau (702), determinant des 
connexions operationnelles entre lesdits noeuds, 
un protocole utilisateur fiable, qui traite les informa- 
tions pour les noeuds actifs, et un reseau logique 
qui reconfigure les communications sur la base 
desdites connexions operationnelles. 

19. Systeme selon la revendication 1 : dans lequel ledit 
dispositif de memorisation memorise seulement 
une partie des informations sur chaque disque de 
chaque noeud. 

20. Systeme selon la revendication 19, dans lequel 
chaque disque de chaque noeud memorise aussi 
des informations indiquant une certaine propriete 
des informations sur les autres disques. 

21. Serveur selon la revendication 1 , dans lequel lesdi- 
tes informations memorisees sur chaque noeud 
memorisent seulement une partie, mais non toutes 
les informations souhaitees, et dans lequel deux 
noeuds ne memorisent pas les memes informa- 
tions. 

22. Serveur selon la revendication 21, dans lequel les- 
dites informations memorisees incluent une partie 
informations, et une partie redondance, ladite partie 
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redondance etant des informations indicatives de 
parties informations pour d'autres noeuds unique- 
ment. 

23. Serveur selon la revendication 22, dans lequel la- 
dite partie redondance est formee a partir d'un code 
matriciel dans tequet une pluralite desdits noeuds 
sont agences en une rnatrice pour former ladite par- 
tie informations ! et ladite partie redondance sont 
formees a partir de sommes de controle suivant les 
diagonales de ladite rnatrice. 

24. Procede pour commander un reseau afin de foumir 
de la redondance, consistant a : 

executer un processus de controle qui effectue 
une lecture distribuee a partir d'une pluralite de 
noeuds systeme (100, 102, 104, 106), lesdits 
noeuds systeme incluant au moins deux dispo- 
sitifs de communication, chacun desdits dispo- 
sitifs de communication dans un quelconque 
noeud systeme etant connecte a un dispositif 
different parmi une pluralite de dispositifs de 
commutation (110, 112), permettant a chacun 
desdits noeuds systeme de communiquer avec 
chacuh desdits autres noeuds systeme sur un 
trajet parmi au moins deux trajets differents, de 
maniere a fournir une communication redon- 
dante, memoriser collectrvement les donnees 
du systeme, chaque noeud memorisant des 
donnees brutes, et des donnees redondantes 
indicatives de donnees brutes qui sont memo- 
risees dans des noeuds autres que ledit cha- 
que noeud pour permettre la reconstruction de 
donnees memorisees a partir des informations 
de memorisation dans un noeud lorsque com- 
binees avec les donnees d'un autre noeud, 
chaque noeud incluant une routine de defection 
(702) qui detecte I'etat fonctionnel du systeme 
qui peut empecher une operation du systeme 
reseau ; et 

effectuer une lecture distribuee comportant les 
etapes consistant a 

determiner un parametre lie a la dispontbilite 
des noeuds du systeme, et 
lire lesdites donnees brutes a partir de ladite 
pluralite de noeuds systeme si ledit parametre 
indique la disponibilite, et lire a la fois lesdites 
donnees brutes et lesdites donn6es redondan- 
tes a partir d'un nombre de noeuds inferieur au 
nombre total de noeuds du systeme si ledit pa- 
rametre indique une disponibtlite non-totale. 

25. Procede selon la revendication 24, consistant en 
outre a : 

memoriser des informations brutes et des infor- 
mations redondantes indiquant le code de cor- 



rection d'erreur dans une pluralite de noeuds 
d'informations, 

determiner un parametre indiquant I'utilisabilite 
desdits noeuds d'informations, 

5 lire lesdites informations brutes a partir de ladi- 

te pluralite de noeuds si ledit parametre indique 
que ladite pluralite de noeuds sont utilisables, 
et lire a la fois lesdites donnees brutes et les- 
dites donnees redondantes a partir d'un nom- 

10 bre de noeuds inferieur au nombre total de 

noeuds si ledit parametre indique qu'au moins 
une partie de ladite pluralite de noeuds ne sont 
pas totalement utilisables. 

15 26. Procede selon la revendication 24, consistant en 
outre a : 

former une rnatrice d'informations, en formant 
chaque colonne de la rnatrice repr6sentant des 
20 informations a partir d'un noeud, 

former une partie brute de chaque colonne in- 
cluant des informations brutes indiquant des 
donnees, 

former des informations redondantes indiquant 
25 des informations de redondance, lesdites infor- 

mations de redondance indiquant des informa- 
tions concemant d'autres noeuds que ledit cha- 
que noeud, telles que prises le long d'une en- 
veloppe de forme specifiee qui obtient des in- 
30 formations a partir dudit autre noeud. 

27. Procede selon la revendication 26, dans lequel la- 
dite enveloppe est une diagonale qui s'etend aux 
autres noeuds au-dela des bords de ladite rnatrice. 

35 

28. Procede selon la revendication 24, consistant a : 

mapper chaque noeud sur une colonne d'une 
rnatrice. 

40 former deux rangees d'informations redondan- 

tes a partir desdites coionnes de la rnatrice, et 
placer lesdites deux rangees dans lesdites 
deux colonnes ; pour former une rnatrice resul- 
tante de N par N incluant N - 2 par N symboles 
45 d'informations, et 2 par N noeuds d'informa- 

tions redondantes, lesdits symboles de parite 
etant construits conformement a : 

50 ^ 

K=0 



n-3 
K=0 



45 



50 



25 



17 
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ou I = 0 t 1 , n - 1 , et <x> n = X mod n. 

29. Systeme de reseau selon la revendication 1 , dans 
lequel les dispositifs de memorisation memorisent 
des informations video. 5 
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Connectivity Protocol State Machine 



(tint^nop) 
{tok && t<n: ?T; U*} 



t=n Initially 



{tin && t>1 IT; t--} 
II 

{tok && t>0; ?T; IT} 




{tout; IT; t~ } 
II 

{tok && t=n: ?T; IT} 



t: token count 
tok: token arrival 
tout: time-out event 
tin: time-In event 
IT: send token 
?T: recv token 
&&,!!: and, or 
ncp: no operation 



{tout:^ nop) 

{tin && t<d: nop} 
tl 

{tok && t=0: ?T; t**} 



FIG. 9 
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C | r j t K = (local iface id, remote machine Id, 
remote iface id) 

for each physical channel from 
this machine to all other known 
machines 

o<l £ number of local interfaces; 

o<j <£ number of nodes in the system 

o<k £ number of remote interfaces 

Run comp for all Cy^ 

Create data structure that stores 

Booiean up/down state for 

each Cl 




Output to data 
structure 



FIG. 10A 
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FIG. 12 
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General problem 



Setting: a network of switches and nodes. 
Goal: node to node communication. 
Fault; switch, node or link failure. 



Specific problem 



Setting: switches forward packets, nodes do not. 
Goal: constant number of Isolated nodes. 
Fault: switch failure. 
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FIG, 13 
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